{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Very deep networks with repeating elements\n",
    "\n",
    "As we already noticed in AlexNet, the number of layers in networks keeps on increasing. This means that it becomes extremely tedious to write code that piles on one layer after the other manually. Fortunately, programming languages have a wonderful fix for this: subroutines and loops. This way we can express networks as *code*. Just like we would use a for loop to count from 1 to 10, we'll use code to combine layers. The first network that had this structure was VGG. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## VGG\n",
    "\n",
    "We begin with the usual import ritual"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:51.744769Z",
     "start_time": "2017-10-18T06:00:51.019959Z"
    }
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import mxnet as mx\n",
    "from mxnet import nd, autograd\n",
    "from mxnet import gluon\n",
    "import numpy as np\n",
    "mx.random.seed(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:51.749941Z",
     "start_time": "2017-10-18T06:00:51.746808Z"
    }
   },
   "outputs": [],
   "source": [
    "ctx = mx.gpu()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load up a dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:52.252105Z",
     "start_time": "2017-10-18T06:00:51.752991Z"
    }
   },
   "outputs": [],
   "source": [
    "batch_size = 64\n",
    "\n",
    "def transform(data, label):\n",
    "    return nd.transpose(data.astype(np.float32), (2,0,1))/255, label.astype(np.float32)\n",
    "\n",
    "train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),\n",
    "                                      batch_size, shuffle=True)\n",
    "test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),\n",
    "                                     batch_size, shuffle=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The VGG architecture\n",
    "\n",
    "A key aspect of VGG was to use many convolutional blocks with relatively narrow kernels, followed by a max-pooling step and to repeat this block multiple times. What is pretty neat about the code below is that we use functions to *return* network blocks. These are then combined to larger networks (e.g. in `vgg_stack`) and this allows us to construct VGG from components. What is particularly useful here is that we can use it to reparameterize the architecture simply by changing a few lines rather than adding and removing many lines of network definitions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:52.283905Z",
     "start_time": "2017-10-18T06:00:52.254227Z"
    }
   },
   "outputs": [],
   "source": [
    "from mxnet.gluon import nn\n",
    "\n",
    "def vgg_block(num_convs, channels):\n",
    "    out = nn.Sequential()\n",
    "    for _ in range(num_convs):\n",
    "        out.add(nn.Conv2D(channels=channels, kernel_size=3,\n",
    "                      padding=1, activation='relu'))\n",
    "    out.add(nn.MaxPool2D(pool_size=2, strides=2))\n",
    "    return out\n",
    "\n",
    "def vgg_stack(architecture):\n",
    "    out = nn.Sequential()\n",
    "    for (num_convs, channels) in architecture:\n",
    "        out.add(vgg_block(num_convs, channels))\n",
    "    return out\n",
    "\n",
    "num_outputs = 10\n",
    "architecture = ((1,64), (1,128), (2,256), (2,512))\n",
    "net = nn.Sequential()\n",
    "with net.name_scope():\n",
    "    net.add(vgg_stack(architecture))\n",
    "    net.add(nn.Flatten())\n",
    "    net.add(nn.Dense(512, activation=\"relu\"))\n",
    "    net.add(nn.Dropout(.5))\n",
    "    net.add(nn.Dense(512, activation=\"relu\"))\n",
    "    net.add(nn.Dropout(.5))\n",
    "    net.add(nn.Dense(num_outputs))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:53.879036Z",
     "start_time": "2017-10-18T06:00:52.285901Z"
    }
   },
   "outputs": [],
   "source": [
    "net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:53.920533Z",
     "start_time": "2017-10-18T06:00:53.898827Z"
    }
   },
   "outputs": [],
   "source": [
    "trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .05})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Softmax cross-entropy loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:53.941011Z",
     "start_time": "2017-10-18T06:00:53.922904Z"
    }
   },
   "outputs": [],
   "source": [
    "softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:00:53.962279Z",
     "start_time": "2017-10-18T06:00:53.943086Z"
    }
   },
   "outputs": [],
   "source": [
    "def evaluate_accuracy(data_iterator, net):\n",
    "    acc = mx.metric.Accuracy()\n",
    "    for d, l in data_iterator:\n",
    "        data = d.as_in_context(ctx)\n",
    "        label = l.as_in_context(ctx)\n",
    "        output = net(data)\n",
    "        predictions = nd.argmax(output, axis=1)\n",
    "        acc.update(preds=predictions, labels=label)\n",
    "    return acc.get()[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-10-18T06:02:36.461653Z",
     "start_time": "2017-10-18T06:00:53.965101Z"
    }
   },
   "outputs": [],
   "source": [
    "###########################\n",
    "#  Only one epoch so tests can run quickly, increase this variable to actually run\n",
    "###########################\n",
    "epochs = 1\n",
    "smoothing_constant = .01\n",
    "\n",
    "for e in range(epochs):\n",
    "    for i, (d, l) in enumerate(train_data):\n",
    "        data = d.as_in_context(ctx)\n",
    "        label = l.as_in_context(ctx)\n",
    "        with autograd.record():\n",
    "            output = net(data)\n",
    "            loss = softmax_cross_entropy(output, label)\n",
    "        loss.backward()\n",
    "        trainer.step(data.shape[0])\n",
    "        \n",
    "        ##########################\n",
    "        #  Keep a moving average of the losses\n",
    "        ##########################\n",
    "        curr_loss = nd.mean(loss).asscalar()\n",
    "        moving_loss = (curr_loss if ((i == 0) and (e == 0)) \n",
    "                       else (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss)\n",
    "        \n",
    "        if i > 0 and i % 200 == 0:\n",
    "            print('Batch %d. Loss: %f' % (i, moving_loss))\n",
    "            \n",
    "    test_accuracy = evaluate_accuracy(test_data, net)\n",
    "    train_accuracy = evaluate_accuracy(train_data, net)\n",
    "    print(\"Epoch %s. Loss: %s, Train_acc %s, Test_acc %s\" % (e, moving_loss, train_accuracy, test_accuracy))    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next\n",
    "[Batch normalization from scratch](../chapter04_convolutional-neural-networks/cnn-batch-norm-scratch.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
